Method and system for cache eviction

ABSTRACT

The proposed system and associated algorithm when implemented improves the processor cache miss rates and overall cache efficiency in multi-core environments in which multiple CPU&#39;s share a single cache structure (as an example). The cache efficiency will be improved by tracking CPU core loading patterns such as miss rate and minimum cache line load threshold levels. Using this information along with existing cache eviction method such as LRU, results in determining which cache line from which CPU is evicted from the shared cache when a capacity conflict arises. This methodology allows one to dynamically allocate shared cache entries to each core within the socket based on the particular core&#39;s frequency of shared cache usage.

This is a Cont. of another Accelerated Exam. application, Ser. No.12/020,531, filed Jan. 26, 2008, to issued in November 2008, as a USPatent, with the same title, inventors, and assignee, IBM.

BACKGROUND OF THE INVENTION

Caches are fast memory modules often on the same chip and close to thecentral processing unit (CPU). Data and instructions used by the CPU areloaded in cache. The benefit of using cache is that the same data ornext instructions (also loaded in cache) are readily available to theCPU and they don't have to be loaded from a slower main memory. When aCPU needs data which is already in the cache, it is called a “hit”;while if the data is not in cache and needs to be loaded from memory, itis called a “miss”. For better performance, it is desired to increasehits and reduce misses.

Caches are also at various levels: L1 is the fastest and closest to theCPU; L2 feeds to L1 and it is not as fast; etc. LLC stands forLast-Level Cache which is farthest from the CPU but often on the samechip or on the next module.

When the processor is composed of multiple CPUs, the CPU cores may besharing the cache with a limited space. The lines of data previouslyloaded in cache may have to be evicted to make room for new data to beloaded to the cache. Simple cache algorithms such as Least Recently Used(LRU) or Least Frequently Used (LFU) (to clear out cache lines to makeroom for new data) do not track individual CPU core loading patterns.When one CPU is much busier than another CPU, using these simplealgorithms to determine cache line eviction priority can increase cachemiss rates and hurt cache efficiency. The proposed algorithm willimprove the processor cache miss rates and overall cache efficiency inmulti-core environments in which multiple CPU's share a single cachestructure most often on a single die.

SUMMARY OF THE INVENTION

This is a system targeted (as one embodiment) for multiple core CPUssharing one shared cache and describes an eviction method based on theCPU usage pattern of cache combined with known methods such as LRU anddetermines the CPU core which should be targeted for LRU. Once a sharecache load command is received, the cache area is examined, and if thecache area is not full, the cache line is marked with the requester'sCPU id and loaded in the shared cache, the cache load tracker (CLT)count for the requester's CPU is incremented, examined against athreshold and if this count exceeds the threshold all such count totals(for all CPUs) are reduced proportionally to protect them against anoverflow.

If the cache area is full, all the cache line load (CLL) counts for allthe CPU's are compared and if they are all equal, LRU cache line for aCPU not responsible for the load is evicted. If the CLL counts for allthe CPU's are not equal, requesting CPU's cache performance thresholds,specifically CLL minimum and the miss rate, are examined. If CLL minimumis exceeded or the miss rate is above the threshold, the LRU cache lineloaded by a CPU with the highest number of CLL count is evicted.Otherwise, the LRU cache line for a CPU with the lowest load rate isevicted.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is the flow diagram for cache eviction system.

FIG. 2 is the continuation of the cache eviction system flow diagram.

FIG. 3 is the load counter reset algorithm for this system.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

This system (as one embodiment) describes a cache eviction method basedon the CPU usage pattern of cache combined with known methods such asLRU and determines which CPU core should be targeted for LRU. Thissystem is proposed for multiple core CPUs sharing a cache load tracker(CLT) module for the shared cache.

Every cache line loaded to cache is marked by the ID of the CPU corerequesting its load to later be able to identify and evict the cachelines based on its original requesting CPU core or any identified CPU.

A cache load tracker (CLT) module is responsible for tracking the numberof cache lines loaded (CLL) into cache per individual CPU core. CLT alsomeasures the load or miss rate on a separate set of memory locations foreach CPU core.

One way to achieve this is based on a running average over apredetermined number of most recent misses. For example, the rate ofmiss for each CPU core will be the number of loads requested by thatcore within the last predetermined number of misses. This requiresstoring the core IDs in the same predetermined number of first in firstout (FIFO) queue memory locations in order to increase the rate for thelatest requesting core and decrease the rate for the last outgoing onein the queue.

Alternatively, the rates per core could be based on the number of missesper core in a predetermined number of misses, having the rates updatedonly after a predetermined number of misses have been accumulated. Thisrequires a counter corresponding to each core for counting thecorresponding core miss events; transferring the value of the countersto the core miss rate memory locations and resetting their values tozero, every time a predetermined number of misses are accumulated.

The steps for this new eviction method are depicted in FIGS. 1, 2 and 3and are as follows: Once a share cache load command is received, cachearea is examined in FIG. 1 step 110 to determine if this area is full.If cache area is not full, cache line could be loaded (FIG. 1, step 120)and steps in algorithm depicted in FIG. 3 is executed: CLL is markedwith the requester's CPU ID and loaded in the shared cache as depictedin FIG. 3 step 310. CLL of the cache load tracker (CLT) for therequesters CPU is incremented in step 312 of FIG. 3. In step 314 of FIG.3, this counter is examined to determine if the value has exceeded athreshold value. If the value has not exceeded the threshold, process iscomplete and algorithm ends. If the value has exceeded the thresholdvalue, all such count totals (for all CPUs) are reduced proportionally(FIG. 3, step 316). This is required to protect the CLT counters againstunwanted overflow.

Back on FIG. 1, step 110, if shared cache area is full, all the CLLcounts for all the CPU's are examined (FIG. 1 step 112). If all the CLLcounts for all the CPU's are equal (FIG. 1, step 114), LRU algorithm isused and a cache line from any of the other CPU's not responsible forthis load is evicted (FIG. 1, step 116) and CLL counter for that CPU isdecremented (FIG. 1, step 118). At this stage algorithm ends as newcache line could be loaded.

If CLL counts for all the CPU's were not equal (FIG. 1, step 114), therequesting CPU's cache performance thresholds are examined in FIG. 2,step 210. A first such threshold is the CLT's CLL minimum thresholdvalue.

If CLT's CLL counter is more than a minimum threshold value (FIG. 2,step 212), LRU is applied to the cache lines corresponding to the CPUwith the highest CLT's CLL count (FIG. 2, step 216) and a cache line isevicted, CLL count for that CPU is decremented (FIG. 2, step 220) and atthis stage the algorithm ends as new cache line could be loaded.

If CLT's CLL count is not more than a minimum threshold value, the2^(nd) CPU cache performance value: miss rate is examined (FIG. 2, step214). If the miss rate is above the threshold value, LRU is applied tothe cache lines corresponding to the CPU with the highest CLT's CLLcount (FIG. 2, step 216) and a cache line is evicted, CLL count for thatCPU is decremented (FIG. 2, step 220) and at this stage the algorithmends as new cache line could be loaded. If the miss rate is not abovethe threshold value, LRU is applied to the cache lines corresponding tothe CPU with the lowest load rate (FIG. 2, step 218) and a cache line isevicted, CLL count for that CPU is decremented (FIG. 2, step 220) and atthis stage the algorithm ends as new cache line could be loaded.

Another embodiment of this invention is a method of cache eviction for amultiple core central processing unit comprising of a multiple corecentral processing unit sharing a last-level cache; loading a firstcache line to a first cache; a first core among the multiple corecentral processing unit requesting a load in the first cache; whereinthe first core has an identification number; marking the first cacheline with the identification number of the first core; a cache loadtracker keeping track of numbers of cache lines loaded into a cache perindividual core among the multiple core central processing unit.

If a first number among the numbers of cache lines loaded into a cacheper individual core exceeds a first threshold, reducing all the numbersof cache lines loaded into a cache per individual core other than thefirst number, proportionally, such that the cache load tracker is notoverflowed; the cache load tracker further measuring load rate and missrate; the cache load tracker recording the load rate and the miss rateon separate memory locations; the cache load tracker taking a runningaverage over a first predetermined number of the most recent misses;storing the identification number of the first core in a secondpredetermined number of first-in-first-out queue of first memorylocations.

If the first cache is full, evicting a second cache line by applying aleast-recently-used filtering method on the second cache line. If thefirst number among the numbers of cache lines loaded into a cache perindividual core exceeds a second threshold or the miss rate exceeds athird threshold, applying the least-recently-used filtering method oncache lines corresponding to a core with the largest number among thenumbers of cache lines loaded into a cache per individual core. If thefirst number among the numbers of cache lines loaded into a cache perindividual core does not exceed a second threshold and the miss ratedoes not exceed a third threshold, applying the least-recently-usedfiltering method on cache lines corresponding to a core with the lowestmiss rate.

Any variations of the above teaching are also intended to be covered bythis patent application. This can apply to a system, apparatus, ordevice with cache for microprocessor, processor, server, PC, or mobiledevice, applying the method above.

1. A system of cache eviction, said system comprising: a multiple corecentral processing unit; and a first cache; wherein said multiple corecentral processing unit shares a last-level cache; said first cache lineis loaded to a first cache; a first core among said multiple corecentral processing unit requests a load in said first cache; said firstcore has an identification number; said first cache line is marked withsaid identification number of said first core; a cache load trackerkeeps track of counts of cache lines loaded into said first cache foreach individual core among said multiple core central processing unit;when a count of said first core's cache lines loaded into said firstcache exceeds a first threshold, all said counts of cache lines loadedinto said first cache are reduced for each individual core,proportionally, such that said cache load tracker is not overflowed;said cache load tracker further measures load rate and miss rate foreach individual core; said cache load tracker records said load rate andsaid miss rate on separate memory locations; said cache load trackertakes a running average over a first predetermined number of most recentmisses; said identification number of said first core is stored in asecond predetermined number of first-in-first-out queues of first memorylocations; when said first cache is full and all said counts of cachelines loaded into said first cache for each individual core are equal, aleast-recently-used cache line corresponding to any core not responsiblefor said load in said first cache is evicted; when said count of saidfirst core's cache lines loaded into said first cache exceeds a secondthreshold or said miss rate for said first core exceeds a thirdthreshold, a least-recently-used cache line corresponding to a core withthe largest said count of cache lines loaded into said first cache isevicted; and when said count of said first core's cache lines loadedinto said first cache does not exceed said second threshold and saidmiss rate for said first core does not exceed said third threshold, aleast-recently-used cache line corresponding to a core with the lowestload rate is evicted.